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Abstract 

There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness 
of well known computational problems. These conditional lower bounds have become important and 
popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, 
the long term goal is to replace these conditional bounds with unconditional ones. In this paper we make 
progress in this direction by studying the cell probe complexity of two conjectured to be hard problems 
of particular importance: matrix-vector multiplication and a version of dynamic set disjointness known 
as Patra§cu’s Multiphase Problem. We give improved unconditional lower bounds for these problems as 
well as introducing new proof techniques of independent interest. These include a technique capable of 
proving strong threshold lower bounds of the following form: If we insist on having a very fast query 
time, then the update time has to be slow enough to compute a lookup table with the answer to every 
possible query. This is the first time a lower bound of this type has been proven. 
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1 Introduction 


Proving lower bounds for basic computational problems is one of the most challenging tasks within computer 
science. Where optimal bounds can often be found for space requirements, we are still a long way from being 
able to establish similar results for time complexity for all but a relatively small subset of the problems 
we wish to study. Due to the difficulty in obtaining these lower bounds, in recent years there has been 
a resurgence in interest in finding bounds which hold conditioned on the conjectured hardness of a small 
number of widely studied problems. Perhaps the most prominent examples are 3SUM-hardness (see e.g. [H]), 
reductions from the Strong Exponential Time Hypothesis (SETH) [23 |28l [Dill E IH H] and specifically for 
dynamic problems, reductions from a version of dynamic set disjointness known as Patra§cu’s Multiphase 
Problem [22| and most recently online Boolean matrix-vector multiplication M- Of course the holy grail 
remains to prove strong unconditional lower bounds for these problems. Unfortunately the state-of-the-art 
techniques for proving lower bounds for data structure problems such as Boolean matrix-vector multiplication 
can only prove time lower bounds of r2(lgTO), where m is the number of queries to the problem. Eor the 
online Boolean matrix-vector multiplication problem there are 2" queries, which means we cannot hope to 
prove bounds beyond 0(n) without ground breaking new insight. This is quite disappointing given that the 
conjectured complexity of the problem is 

In this paper we add to the understanding of the true complexity of dynamic and online problems by 
giving new unconditional lower bounds for Patra§cu’s Multiphase Problem as well as online and dynamic 
matrix-vector multiplication over finite fields. Our focus is to prove unconditional polynomial lower bounds 
for restricted ranges of trade-offs between update time, query time and space. 

Eor Patra§cu’s Multiphase Problem, we prove a new type of threshold lower bound saying that if we 
insist on having a very fast query time, then the update time essentially has to be high enough to compute 
a lookup table of the answer to every possible query. This is the first threshold lower bound of this form. 

Eor matrix-vector multiplication, the lower bounds we prove demonstrate that if a data structure doesn’t 
explicitly try to exploit that it is dealing with a small finite field, then it is doomed to spend time 

per operation. Eurthermore, our lower bounds are as strong as current techniques allow. Matrix-vector 
multiplication is a basic computational primitive in applied mathematics and so our new bounds for this 
problem are also of separate and independent interest. 

The lower bounds we prove are all in the cell probe model of computation. We present this model in the 
following. 

Cell probe model 

A data structure in the cell probe model consists of a set of memory cells, each storing w bits. Each cell of 
the data structure is identified by an integer address, which is assumed to fit in w bits, that is each address 
is amongst [2*"] = {0,..., 2*" — 1}. So that a cell has enough bits to address any update operation performed 
on it, we will assume w G r2(lgn) when analysing a data structure’s performance on a sequence of n updates. 

During an update operation, the data structure reads and updates a number of the stored cells to reflect 
any changes. The cell read or written to in each step of an update operation may depend arbitrarily on both 
the update and the contents of all cells previously probed during the update. The update time of a data 
structure is defined as the number of cells probed, that is read or written to, when processing an update. 

In order to answer a query, a data structure probes a number of cells from the data structure. Erom 
the contents of the probed cells, the data structure must return an answer to the query. As with update 
operations, the cell probed at each step, and the answer returned, may be an arbitrary function of the query 
and the previously probed cells. We define the query time of a data structure as the number of cells probed 
when answering a query. 

The cell probe model was introduced originally by Minsky and Papert m in a different context and then 
subsequently by Eredman m and Yao [30] ■ The generality of the cell probe model makes it particularly 
attractive for establishing lower bounds for dynamic data structure problems. The cell probe model, for 
example, subsumes the popular word-RAM model. 
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Previous cell probe lower bounds 

The main approaches for proving dynamic data structure lower bounds in the cell probe model have his¬ 
torically been based on the chronogram technique of Fredman and Saks m, which until approximately a 
decade ago was able to prove il(lgn/lglgn) lower bounds at best. This technique was based on partitioning 
a sequence of updates into epochs of geometrically decreasing size and then arguing that, amongst the cells 
updated during each epoch, any correct data structure has to probe n(l) of them. In 2004 a breakthrough 
led by Patra§cu and Demaine developed the information transfer technique which gave the first n(logn) 
lower bound per operation for several data structure problems [26j . Later on it was also shown that an 
f2(lgn) time lower bound can be derived using the same approach for the related questions of streaming 
and online computation, including multiplication and various string matching problems [3 [HI E] . The key 
difference between the streaming and online problems and a standard dynamic data structure setting is that 
although there are still many different possible updates at each step, there is only one query which is simply 
to output the latest result. 

In 2012 there was another breakthrough for dynamic data structure lower bounds. The new idea was 
to combine the cell sampling approach of Panigrahy et al. |20] with the chronogram technique of Fredman 
and Saks. In essence, this approach allows one to argue that when answering a query, one has to probe 
\g{wtu)) cells from each epoch instead of 11(1). With around lgn/lg(wt„) epochs, this gives lower 
bounds of roughly n(lgnlgm/(lg(wt„))^). Here m is the number of queries that can be asked in the data 
structure problem. This resulted in an tg = H((lgn/lg(wt„))^) lower bound for dynamic weighted range 
counting and tg = H(lg |F| Ig n/lg(ri;t„/Ig |F|) lg(zct„)) for dynamic polynomial evaluation when computing 
over a field F of size at least r2(n^) [THl HB]- This latter bound was, until this current work, the only such 
bound that holds for randomised data structures which can err with constant probability. Perhaps due to 
the technical difficulties involved, no further lower bounds of this form have been shown to date. 

Attacking the problem of finding lower bounds from a different angle, Patra§cu and Thorup showed a 
sharp query/update time trade-off for dynamic connectivity in undirected graphs. They showed that any 
data structure that supports edge insertions in o(lgn) probes, must have worst case connectivity time 
in the cell probe model assuming cells of 0(lgn) bits [24]. In other words, really fast updates imply nearly 
naive running time for queries. 

Towards the aim of giving yet higher lower bounds, in |22] Patra§cu introduced a dynamic version of set 
disjointness which he termed the Multiphase Problem. He showed reductions for this problem, first from 
3SUM and then to dynamic reachability, dynamic shortest path as well as subgraph connectivity and other 
problems of general interest. Assuming that there is no truly sub-quadratic time solution for 3SUM, he was 
then able to give the first known polynomial time lower bounds for many dynamic data structure problems. 

Online matrix-vector multiplication |14j can also be viewed as a static problem in classic data structure 
terminology, that is we receive some data to preprocess (a matrix) and then we answer queries (vectors). 
Thus we find it relevant to also list previous techniques and barriers for proving static cell probe lower 
bounds. 

One of the early techniques for proving static lower bounds was based on a reduction from asymmet¬ 
ric communication complexity by Miltersen et al. m- This technique led to lower bounds of the form 
H(lgTO/lgS') where m is the number of queries in the data structure problem and S is the space usage. For 
most natural data structure problems, m is only polynomial in the input size n and S > n. This means that 
for most problems, the lower bounds degenerates to H(l). 

This barrier was overcome in the seminal papers of Patra§cu and Thorup Ell EH] where they intro¬ 
duced a refined reduction from communication complexity that pushed the barrier to lower bounds of 
t = Q.{fgm/\g{Sm/n)). Extending upon ideas of Panigrahy et al. [20], Larsen [16] tweaked their cell 
sampling technique to give slightly higher lower bounds of H(lg m/ lg(S'/n)). This remains the highest static 
lower bound to date. 
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1.1 Our Results 


Patra§cu’s Multiphase Problem. In the Multiphase Problem, we have three phases. In Phase I, we 
receive k subsets Xi,... ,Xi. of a universe [n] and must preprocess these into a data structure. In Phase II, 
we receive another set Y C [n] and we are allowed to update our data structure based on this set Y. Finally, 
in Phase III, we receive an index i € [fc] and the goal is to return whether XiDY = 0. The three performance 
metrics of interest to us are the following: The space usage, S, is defined as the number of memory cells 
of w = n(lgfc) bits used by the data structure produced in Phase I. The update time, is the number of 
probes used in Phase II. The query time, tq, is the number of probes spend in Phase III. 

As mentioned earlier, Patra§cu showed hardness results for the Multiphase Problem by a reduction from 
3SUM. His reduction shows that for k = 0(n^-®), it is 3SUM hard to design a word-RAM data structure for 
the Multiphase Problem that simultaneously spends time in Phase I, time in Phase II 

and 7jO'5-f2(i) time in Phase III. Proving such polynomial lower bounds in the cell probe model is far out of 
reach. Nevertheless, we still find it extremely important to see what actually can be said unconditionally 
and try to understand the limitations of our techniques better. 

In Section [21 we introduce a new technique for proving strong threshold lower bounds for dynamic data 
structures. We apply our technique to the Multiphase Problem and show the following: Any cell probe data 
structure for the Multiphase Problem with w'^ < n < k, using space k'nP^^'> cells oi w = ft{lgk) bits and 
answering queries in o(lgfc/lgn) probes, must have /w. This lower bound holds even if the set 

Y inserted in Phase II has size 0(lgA:/lgn). 

In the most natural case of w = 0(lgA:), we can set n = Ig"^ fc and the lower bound says that any 
data structure for the Multiphase Problem with Ig'* A:-sized sets, which uses fclg^*-^^ k words of space and 
supports queries in o(lgfc/lglgA:) time, must have update time k^~°^^\ And this applies even if Y has size 
0(lgfc/lglgA:). This lower bound has quite a remarkable statement: If we want to do anything better in 
Phase III than checking each element in Y one at a time for inclusion in Xi, then Phase II has to compute 
a table of all the answers to all the k possible queries. There is essentially no strategy in-between the two 
extremes. 

The previous result that comes closest in spirit to our new lower bound is the threshold results of 
Patra§cu and Thorup |24j . showing that any data structure for dynamic connectivity in undirected graphs 
with n nodes, having update time = o(lgn), must have query time tq = Thus their lower bound 

is essentially the opposite way around. 

Since our lower bound is proved for the Multiphase Problem, we immediately get a similar lower bound 
for a number of problems, simply by reusing the previous conditional hardness reductions. We mention two 
examples from |22j here: For dynamic connectivity in directed graphs with n nodes and m = nlg*^*-^^ n edges, 
any data structure using mlg*^^^^ m space and supporting connectivity queries in o(lgn/lglgn) time, must 
have update time For dynamic shortest paths in undirected graphs with n nodes and m = nlg*^*-^^ n 

edges, any data structure using m Ig*^^^^ m space and supporting distance queries in o(lgn/lglgn) time, 
must have update time Both lower bounds hold even if one node in the connectivity/distance query 

is a fixed source node (common to all queries). 

Online matrix-vector multiplication. Given an n x n matrix M with coefficients from a finite field F, 
preprocess M into a data structure, such that when given a query vector u G F", we can quickly compute 
Mv. 

In Section |3l we show a lower bound of 



cell probes to compute Mu for a query vector u S F”. This holds even if the data structure is allowed to err 
with probability 1 — 1/|F|”/"^ on average over all pairs of a matrix M and a query vector v. This is the first 
lower bound of this type which applies even under such extreme probability of error. 
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For the natural range of parameters |F| = and w = 0(lg|F|), the lower bound simplifies to t = 

n(min{nlg |F|/ lg(5'/n), n^}) and is the strongest current techniques can show for a static problem with |F|" 
queries. For linear space, this is t = i7(min{nlg |F|,n^}). As the size of the field F tends to 2", the lower 
bound says that any data structure with near-linear space has to “read” the entire matrix to compute Mv, 
even if allowed to err with overwhelming probability. While this might sound odd at first, note that it also 
means that any data structure that doesn’t explicitly try to exploit that it is dealing with a small field is 
doomed to use time to compute Mv. 

Frandsen et al. nni also proved lower bounds for online matrix-vector multiplication, where the first term 
in the min-expression above is replaced by nig IFj/lg^. Their lower bound thus also shows that as the field 
size grows, the trivial solution is the only option. Comparing their lower bound to ours, we see that for linear 
space, our lower bound is a factor Ign stronger. Furthermore, their lower bound holds only for deterministic 
data structures, whereas ours allow an only exponentially small probability of returning the correct answer. 


Dynamic online matrix-vector multiplication. Maintain an n x n matrix M with coefficients from a 
finite field F under 

• updates of the form Mij <— x for a row index i, column index j and an x G F; 

• matrix-vector queries. Given an n-vector v return the product Mv. 

In Section m we prove that any cell probe data structure for the dynamic online matrix-vector multipli¬ 
cation problem on an n x n matrix M, with w bit cells and worst case update time must use 


tq = n 


^min 


nig |F| lg(n/rc) n^lg|F| 
l„2 { iuH.'l ' w 


cell probes to compute Mv for a query vector u G F". This holds if the data structure errs with probability 
no more than 1/3 when answering any query vector v after any sequence of n^ updates. The lower bound 
we prove for dynamic online matrix-vector multiplication equals the highest that have ever been achieved 
for any dynamic data structure problem (in fact it is slightly stronger than any previous bound for update 
time tu = 0(lg|F|/w), making it the strongest to date). It is also only the second example, after [16], of 
such a lower bound that holds under constant probability of error. 

Given that progress on proving these (Ig m Ig n)-type dynamic lower bounds has been very slow, we find it 
an important contribution in itself to give a new lower bound of this form. We hope that the proof eventually 
will inspire new ways of proving lower bounds and will push the barriers further. In particular, one of the 
biggest problems with the current lower bound technique of Larsen m is that it can only be applied for 
problems where the answer to a query carries more information (bits/entropy) than it takes to describe a 
query. This in particular implies that the technique cannot be applied to decision problems. Our proof of 
the above lower bound makes some progress on this frontier. In the proof, we eventually end up with a 
collection of queries whose answers “reveal” only a small constant fraction of the “information” needed to 
describe them. We elegantly circumvent the limitations of the lower bound technique by using a randomized 
encoding argument that allows us to save a constant fraction in the “description size” of the queries. We 
refer the reader to the proof itself for the details. 


2 Threshold Bounds for The Multiphase Problem 

We prove our lower bound for the Multiphase Problem in the cell probe model. The lower bound we prove 
is the following 

Theorem 1. Any cell probe data structure for the Multiphase Problem on k sets from a universe [n], where 
<n<k, using cells of w = n(lgA:) bits of space and answering gueries intq = o(lgfc/lgn) probes, 

must have t^ = /w. This holds even if the update set in Phase II contains 0(lgfc/lgn) elements. 


4 




We prove Theorein[T]by a reduction from a variant of the communication game Lop-sided Set Disjointness, 
or LSD for short. In LSD, we have two players Alice and Bob. Alice and Bob receive subsets V and W of 
a universe [17] and must determine whether V HW = ^ while minimizing their communication. The term 
Lop-sided stems from Alice’s set having size N, where NB = U for some value B > 1. As mention, we use a 
variant of LSD known as Blocked-LSD [^. In Blocked-LSD, the universe is the Cartesian product [A^j x [B], 
Bob receives a subset W of [TV] x [B], which may be of arbitrary size. Alice’s set V satisfies that for all 
j & ]TV], there is exactly one bj S [B] such that (j, bj) G V, i.e. V has the form {(0, bo), ..., (TV — I, b^-i)}- 
Patra§cu proved the following lower bound for Blocked-LSD: 

Theorem 2 (Patra§cu |21jl. Fix (5 > 0. In any deterministic protocol for Blocked-LSD, either Alice sends 
SNlgB bits or Bob sends bits. 

In our reduction, we will need the following lemma: 

Lemma 1. Consider a communication game in which Bob receives a set B C [2“’] of size S and Alice 
receives a set AC B of size k. There is a deterministic protocol in which Alice sends 0{k\g{S/k)) bits, Bob 
sends 0(kw) bits, and after communicating. Bob knows Alice’s set A. 

The proof of Lemma [T] is based on a simple application of hashing and is given in Section 12.11 We note 
that a similar trick has been used by Miltersen [TB], but only for fc = 1. His proof thus “costs” IgS' bits in 
Alice’s communication per element in A, whereas we shave this down to lg(iS'/A:) bits. We are now ready to 
give the reduction from Blocked-LSD to the Multiphase Problem. 

Proof (of Theorem[Jf). Assume we have a data structure V for the Multiphase Problem with k sets from the 
universe [n], where w'^ < n < k. Let S be the space usage of D in number of cells of w = Ll(\gk) bits each. 
Let tq be its query time and its update time. We assume S = kn^^^'> and tq = o(lgA:/lgn) and show 
this implies T„ = Note that for this setting of parameters, we have n = since otherwise it is 

impossible to have tq = o(lgfc/lgn). 

Define i = ^tq \gk/\gn. Since we assumed tq = oifgk/\gn), we have (. = oilgk/ \gn) and (. = w(tq). 
We use V to give an efficient communication protocol for Blocked-LSD on the universe \kf] x [n/F\. For this 
setting of parameters, Theorem [2| says that either Alice sends Ll{kl\g{n/tj) = uj{ktq\gn) bits, or Bob sends 
> ki(w^ ji) = ui{kw'^) bits. 

Alice receives V and Bob receives W, both subsets of [k£] x (njfj. Alice’s set V satisfies that for all 
j G [kl\, there is exactly one bj G [n/l] such that {j,bj) G V. Alice and Bob now conceptually partition 
[kl\ into k consecutive groups Gi ,... ,Gk of £ elements each, i.e. the first group is Gi = {0,...,£ — 1}, the 
second is G 2 = {£,..., 2£ — 1} etc. For i = 1,..., fc we let Vi denote the subset of pairs (j, bj) gV for which 
j G Gi- Similarly we let Wi denote the subset of pairs {j,h) G W for which j G Gi. Observe that \Vi\ = £ 
for each i. There is no size bound on Wi other than the trivial bound \Wi\ < £{nf£) = n. 

Alice now interprets each of the subsets Vi C {i£,... ,{i -\- \)£ — 1} x [n/£] as an Asized subset of the 
universe [n], denoted Yi. This is done by mapping a pair {j,bj) G Vi to the element {j mod £)(nl£) bj. 
Bob similarly interprets each of his subsets Wi C {i £,..., (i -|- I)!* — 1} x [n/£] as a subset of the universe 
[n], denoted Xi. He also does this by mapping a pair {j,h) G Wi to the element (j mod £){nf£) -I- h. The 
crucial property of this reduction is that V CW = % if and only if Ai 0 Fi = 0 for alH = 1,..., fc. The goal 
now is for Alice and Bob to use V to test whether n Fi = 0 for all T = I,..., fc and thereby determine 
whether F n IF = 0. This is done using the following protocol: 

1. Bob starts by running Phase I of the Multiphase Problem on V with the sets Ai,..., A^ as input. This 
creates a data structure using only S = kn^^^'> memory cells. Note that Bob does not communicate 
with Alice in this step and thus the constructed data structure is only known to Bob. 

2. Alice now iterates through all Asized subsets of [n]. For each such subset F, she runs Phase H of 
the Multiphase Problem on the data structure held by Bob with F as input. This is done as follows: 
For a subset F, Alice first initializes an empty set of cells G(Ai,..., Xk, Y). She then starts running 
the update algorithm of D with F as input. This either requests a memory cell or overwrites the 
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contents of a memory cell. In the latter case, Alice stores the overwritten cell in C(Xi,... ,Xk,Y), 
including its address and its new contents. In the first case, Alice checks whether the requested 
cell is in C(Ai,... ,Xk,Y). If so, she has the contents herself and can continue running the update 
algorithm. Otherwise, she asks Bob for the contents of the cell by sending him w bits specifying the 
cell’s address. Bob then replies with its w bits of contents. When this terminates, each of the cell 
sets C(Ai, ... ,Xk,Y) held by Alice stores the contents and addresses of every cell that is updated if 
running Phase II on I) with Y as input, after having run Phase I on with the sets Xi,... ,Xk as 
input. Since T> has update time Alice and Bob both send no more than tuW bits for each Asized 
subset of [n]. Since we chose £ = o(lgk/lgn), this is no more than bits in total. Note that 

Bob performs no other actions in this step than to reply to Alice with the contents of the requested 
cells (with the contents right after processing Xi, ..., Xk in Phase I). 

3. Alice now runs the Phase III query algorithm of V for every possible query i € [/c] in parallel. The 
execution for a query index i will be run as if the updates Yi had been performed in Phase II. The 
query i will thus return whether n Ti = 0. More formally, Alice does as follows: For t = 1,... ,tq 
in turn, Alice will simulate the t’th probe of V for every query i € [/c]. She will do this so that the 
execution is identical to having run the updates Yi in Phase II. For the t’th probe, the query algorithm 
of V requests a memory cell Ct,i for each i. For the cell Cty, she checks whether that cell is contained 
in C{Xi ,..., Xk, Yi). If so, she has the contents of the cell as if update Yi was performed in Phase II 
and she can continue to the next probe for that i without communicating with Bob. If not, she knows 
that the contents of Ct^i was not changed when performing the updates Yi in Phase II. She then adds 
the address of to a set of addresses Z*. The set Zt thus holds the addresses of all cells needed to 
execute the t’th probe for each query i € [fc], and Alice needs the contents of these cells as they were 
right after Phase I. Alice will now ask Bob for the contents of all cells in Zt. The point of collecting the 
cells needed in one set Zt , rather than asking for them one at a time, is to save on the communication, 
i.e. Alice wants to send less than w bits (the address) to Bob per cell in Zt. This is done by invoking 
Lemma [TJ with the B in Lemma [T] being the addresses of all cells written to in Phase I on input 
Xi,..., Xk and A is the set Zt. After using Lemma [TJ Bob knows Zt and sends the contents of all cells 
in Zt to Alice. By Lemma (TJ Bob will send 0{kw) bits and Alice will send 0{k\g{S/k)) = 0{k\gn) 
bits. Alice can now continue with probe < + I and eventually the data structures determines whether 
XiHYi =0 for each i. Since XiHYi = % for each i iff F n IF = 0, this completes the description of the 
protocol. 

We have thus given a protocol for Blocked-LSD on [k£] x [n/£] in which Alice sends k°^^huW + 0{tqk Ig n) bits 
and Bob sends k°^^Hu'w + 0{tqkw) = k°^^HuW + o{kw‘^) bits. But the lower bound says that either Alice must 
send u}{tqk\gn) bits or Bob must send uj{kw'^) bits. This implies tuwk°^^^ = uj{k) /w. □ 

2.1 Communicating a Subset (Proof of Lemma [1]) 

Let B C [2™] with \B\ = S and let A C S with |A| = k. Bob receives B and Alice receives A. Consider 
the 2/2^-universal hash function ha{x) = [(acc mod 2“’)/2“’“'^J of Dietzfelbinger et al. [^, where a is a 
uniform random odd integer less than 2*". By 2/2^ universal we mean that for any two distinct x,y G [2*"], 
we have Pra[/ia(a:) = ha{y)] < 2/2*^ (note there are 2^ possible values ha{x) can take). Letting M = figs'], 
we have for any two distinct bi,b 2 G B that Pra[ha{bi) = ha( 62 )] < 2/S. The expected number of distinct 
pairs (&i,& 2 ) G B for which ha{hi) = ^ 0 ( 62 ) is at most 2S. Thus there must exist an odd integer a* G [2“] 
such that the number of pairs (&i, 62 ) G B where ha» (hi) = ha» ( 62 ) is at most 2S. Bob starts by sending Alice 
such an odd integer a*, costing w bits of communication from Bob. Alice now computes ha* {A) C [2^^] and 
sends ha* (A) to Bob by specifying it as a subset of ]/2^\. Since \ha* (A)| < k and 2^ < 2S, this costs at most 
Ig (^^) = 0{k \g{S/k)) bits. For each i in ha* (A), Bob computes the set Bi consisting of all elements mb G B 
such that ha* (b) = i. Since the total number pairs bi,b 2 G B with ha* (hi) = ha* (h 2 ) is no more than 2S, we 
have ,(a) 1^*1^ — ^ ha*{A), Bob now picks Mi = [Ig (8|i3ip)] and finds an odd integer 

a* G [2^‘] such that for all hi,h 2 G Bi, we have ha* (hi) ^ ho*(h 2 ). Bob sends all these a*’s to Alice, costing 
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at most 0[\ha* (^)|iy) = 0{kw) bits. Finally Alice computes for each i S ha* (A) the set Ai of elements a € A 
such that ha*{a) = i. For each Ai, Alice computes ha*{Ai) C [2^*]. Since ,{a) ~ ^>('5')) Alice can 

now send ha*{Ai) to Bob for every i with a total communication of at most Ig = 0{k\g{S/k)) bits. 

Since AC B and ha*{bi) ^ ho* ( 62 ) for any two 61,62 S Bi, Bob has learned A. 

In the protocol above, Bob sends 0{kw) bits and Alice sends 0(fclg(S'/fc)) bits. Note that the protocol 
is deterministic, the randomness of the hash functions is only used to argue that there exists a choice of a* 
and a*’s. We have thus proved Lemma [TJ 


3 Online Matrix-Vector Multiplication 

In this section, we consider the online matrix-vector multiplication problem: Given an n x n matrix M with 
coefficients from a finite field F, preprocess M into a data structure, such that when given a query vector 
r; G F", we can quickly compute Mv. We consider the problem in the cell probe model with w bit cells where 
w is assumes to be at least Ign and at least Ig |F|. Our lower bound is as follows: 

Theorem 3. Any cell probe data structure for the online matrix-vector multiplication problem, using S cells 
of w bits of space to store an n x n matrix with coeffieients from a finite field F, must use 


t = fl 


^min 


nlg|F| n^lg|F| 
/" s™ V w 


cell probes to compute Mv for a query vector z; S F". This holds even if the data structure is allowed to err 
with probability 1 — 1/|F|"/^ on average over all pairs of a matrix M and a query vector v. 

For the natural range of parameters |F| = and w = 0(lg|F|), the lower bound simplifies to t = 

n(min{nlg |F|/lg(5'/n), n^}). For linear space, this is < = n(min{nlg |F|, n^}). As the size of the field F 
tends to 2 ", the lower bound says that any data structure with near-linear space has to “read” the entire 
matrix to compute Mv, even if allowed to err with overwhelming probability. 

We give the proof in the following. The proof is based on an encoding argument. 


Encoding Argument. Consider a randomized data structure V for online matrix-vector multiplication 
using S cells of w bits of space. Assume the data structure answers queries in t probes with error probability 
1 — 1/IFI"/"^ on average over all pairs of an input matrix M and query vector v. Now consider the following 
hard distribution: The input matrix is a uniform random matrix M in and the query to be answered 

after preprocessing is a uniform random z; G F". By fixing the random coins of V, there exists a deterministic 
data structure V* with space S cells of w bits, query time t and error probability 1 — 1/IFI"/"^ over the hard 
distribution. Using Markov’s inequality, we conclude that there must be a family of matrices Ai C F"^", 
with 


such that for every matrix M G A4, D* answers at least |F|"/|F|"/^ = |F|"/^ of the possible query vectors v 
correctly after having preprocessed M. To derive the lower bound, we show that V* can be used to efficiently 
encode every matrix M £ Ai into a bit string with length depending on t, S, w and |F|. If every M £ Ai can 
be uniquely recovered from these bit strings, we know that at least one of the bit strings must have length 
Ig \M\ > (n^ — n/2) Ig |F| resulting in a lower bound trade-off for t, S, w and |F|. 

To encode a matrix M G Ad, we do as follows: 

1. Construct V* on M. This gives a memory representation consisting of S cells of w bits. Now iterate 
over all vectors z; G F" and collect the subset V consisting of those vectors v for which V* does not err 
when answering v after having preprocessed M. Since M £ Ai, we know \V\ > IFI"/^. 


|Ad| > |F|^ 


1 - 


1 - 


lfpr74 


1 - 


nii’ln/2 


iFr 
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2. Interpret every vector ri G F" as an integer f{v) in the range [F"] = {0,..., |F|” — 1} in the natural way 

/(^) = Consider the random hash function ha ■ [F"] [F”/®] with h{x) = {x + a) 

mod |F|"/® for a uniform random a € [F"/®]. Let denote the set of all vectors v G F" for which 
haifiv)) = 0. We always have |VL°| = |F|7"/®. Furthermore, Ea[|IF° n I^|] = |y|/|F|”/® > |F|3”/®. 
Hence there exists a choice of a G [F"/®] such that |LF° fl H| > |F|®"/®. The first part of the encoding 
is such a value a* G [F”/®], costing 3nlg |F|/8 bits. 

3. Having chosen a*, we now consider every set C of A = Ig |F|/(1024r(;) memory cells in the data 
structure. For each such set C, let Qc denote the set of query vectors v G F” for which V* probes 
only cells in C when answering v after having preprocessed M. We let C* be the the set of A memory 
cells for which \Qc* H VF°. fl H| is largest. We then write down the addresses and contents of cells in 
C*. This costs no more than A{w + IgS”) < 2Aw = v? Ig |F|/512 bits. 

4. We now consider the set of query vectors V* = Qc* H LF°. fl V. Since any fc-dimensional subspace of 
F" contains at most |F|^ vectors, we know that dim(span(H*)) > lg|]f| |H*|. We can thus find a set 
of lg|H*|/lg|F| linearly independent vectors in V*. We write down such a set of vectors U. Since 
U C TT°., we can specify U as indices into IF°., costing only (Ig |y*|/lg |F|)(7nlg |F|/ 8 ) = 7nlg |H *|/8 
bits. 

5. Finally we initialize a set of vectors A = 0 and iterate through all vectors in F" in lexicographic order. 
For such vector x, we check if a; G span({7 U X). If so, we continue to the next vector. If not, we add 
X to X. This terminates with dim(span({7 U X)) = \U\ + |A| = n. In the last step of our encoding 
procedure, we examine each row vector rm of M in turn. For the z’th row vector, we compute the 
inner product {mi,x) over F for every x G X. We write down each of these |A| inner products for a 
total of n|A| Ig |F| bits. This concludes the description of the encoding procedure. 

Before presenting the decoding procedure, we make a few remarks regarding the ideas in the above encoding 
procedure. Intuitively each query in U can be answered solely from the contents of C*. Furthermore, the 
query vectors in U are linearly independent and thus in total reveal \U\ Ig |F| bits of information about each 
of the n rows of M. The hashing trick in steps 2-3 ensure that the vectors in U can be described using only 
Ig |IF°. I = 7nlg |F |/8 bits each. Thus each vector reveals nig |F |/8 more bits of information about M than 
it costs to describe. Thus U will have to be small, leading to a space time trade off. 

We now show how M can be recovered from the encoding produced above. The decoding procedure is 
as follows: 

1. From the bits written during step 2. and 3. of the encoding procedure, we recover a* and C*. From 
a* we also obtain IT°.. 

2. Now that IF°. has been recovered, we use the bits written in step 4. of the encoding procedure to 
recover the set of vectors U. We now run the query algorithm of D* for every v G U. Since U C Qc*) 
the query algorithm only probes cells in C* when answering these queries. Since we have the addresses 
and contents of all cells in (7*, we thus obtain Mv for every v G U, i.e. we know (mi,v) for every row 
vector rrii and every v G U. 

3. Finally we initialize an empty set of vectors A = 0 and iterate through all vectors a: G F" in lexico¬ 
graphic order. For each vector x, we check if x G span([/ U A). If so, we continue to the next vector. 
If not, we add x to A and continue. This recovers the exact same set of vectors A as in step 5. of 
the encoding procedure. From the bits written during step 5. of the encoding procedure, we obtain 
{rrii, x) for every x G A. Since dim(span([/ U A)) = n and we know {mi,u) for every u G U U X, this 
uniquely determines rrii which completes the decoding procedure. 

Analysis. Above we argued that the above procedures allow us to encode and decode every matrix M G M 
into a bit string. Thus there must be a matrix M G A4 for which the bit string produced has length at least 



Ig \M\ > {'n? — n/2) Ig |F|. But the encoding produced has length 

3nIg |-F|/8 + Ig |F|/512 + 7nIg |y*|/8 + n\X\ Ig |F| 

bits. Since \U\ = Ig |y*|/lg |F|, we have |X| = n —\U\ = n — Ig |y*|/lg |F| and we conclude that we must 
have 


(n^ - n/2) Ig |F| < 3n Ig |F|/8 + Ig |F|/512 + In Ig |y*|/8 + n(n - Ig \V*\I Ig |F|) Ig |F| 

= 3nlg|F|/8 + n2lg|F|/512 + n^lg|F| -nlg|y*|/8 


This implies 


nlg|y*|/8 < 7nlg|F|/8 + n2lg|F|/512 ^ 

< 71g|F|+nlg|F|/64^ 

< nlg|F|/32. 


Since C* was chosen such that |y*| was largest possible, we know by averaging that if the data structure 
has query time t < A/2, then 


|y*| > 


> 

> 


> 


(i) 

|tT°. nF|(S'-t)!(S'-A)!A! 

S'!(S'-A)!(A-t)! 

|tT°. ny|(S'-t)!A! 

S\{X-t)\ 

|tT0.nF|(A-t)‘ 


5* 



Taking logs, we conclude that we must have 


3n lg|F|/8 + tlg 


tig 


/ n^ lg|F| \ 
V20485wy 
p048S'w\ 


< nlg|F|/32=^ 
> llnlg |F|/32 


t = 



Since the above calculation needed t < A/2 = n^ Ig |F|/(2048w), we conclude that 


t = fl 


^min 


nlg|F| n^lg|F| 

Ig- ( Sw \ ’ W 
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4 Dynamic Online Matrix-Vector Multiplication 

In this section, we prove a lower bound for the dynamic online matrix-vector multiplication problem: Main¬ 
tain an n X n matrix M with coefficients from a finite field F, such that we can efficiently support entry 
updates of the form mij x for a row index i, column index j and an a: G F. The matrix M is initialized 
to the all O’s matrix, and at any time, we may ask a query r; G F” and the data structure must return Mv. 
We prove the following lower bound for this problem: 

Theorem 4. Any cell probe data structure for the dynamic online matrix-vector multiplication problem on 
an n X n matrix M, with w bit cells and worst case update time t^, must use 


tn = it 


nig |F| lg(n/w) n^lg|F| 


ig- 


(4fi) 


w 


cell probes to compute Mv for a query vector u € F". This holds if the data structure errs with probability 
no more than 1/3 when answering any query vector after any sequence of updates. 

To prove the theorem, we follow the general approach ventured in m- The first step is to define a hard 
distribution. 


Hard Distribution. For the dynamic online matrix-vector multiplication problem, our hard distribution 
is as follows: Let (i„ 2 , j„ 2 , x„ 2 ), (i„ 2 _i, j„ 2 _i, x„ 2 _i),..., (ii,ji,xi) be a sequence of updates to the matrix 
M. The triple (i„ 2 ,/„ 2 ,a:„ 2 ) is the first update and {ii,ji,xi) is the last update. A triple (ik,jk,Xk) 
corresponds to the update operation ^ Xk- The values Xk are uniform random and independent 

in F. The sequence of row and column indices ik,jk is some fixed sequence of well-spread indices, where 
well-spread is defined as follows: 


Definition 1. A sequence of row and column indices (f„ 2 , j„ 2 ),..., is well-spread if: 

• All pairs {ik,jk) are distinct. 

• For every index < r < and every set of n/2 row indices S C {!,... ,n}, there exists a subset 
S* C S with IIS’*! < Bn^/r, such that Ufc<r'ifces*{/fe} — 'a/4:. 


Note that our hard distribution only needs that the sequence of update indices is well-spread. We do not 
care about the particular indices in the sequence. A well-spread sequence of indices basically guarantees that 
in every big enough set of rows (size at least n/2), there exists a small subset of rows (S*), such that the 
indices updated in these rows “cover” at least n/4 columns. This must be true even if considering only the 
last r updates for any < r < n^. The following lemma shows that such a well-spread sequence indeed 
exists: 


Lemma 2. There exists a well-spread sequence (i„ 2 ,_ 7 „ 2 ),..., (ii,ji) of row and column indices. 

The proof of the lemma is a rather straight forward counting argument. We thus defer it to Section 031 

Following the sequence of updates (in‘^,jn'^,Xn2), (in^-i, jn'^-i,Xrfi-i), ... ,{ii,ji,xi), we ask a uniform 
random query G F”. This concludes the description of our hard distribution. 

By fixing the random coins, a randomized data structure T) for dynamic online matrix-vector multipli¬ 
cation, with w bit cells, worst case update time tu, query time tq and error probability at most 1/3 on any 
sequence of updates followed by a query, yields a deterministic data structure V* with w bit cells, worst 
case update time query time tg and error probability 1/3 over the hard distribution. We thus continue 
by proving a lower bound for such a deterministic data structures. 
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Chronogram Approach. Following [TS], we partition the random updates (j„ 2 , j„ 2 ,a;„ 2 ),..., (ii,ji^xi) 
into epochs of roughly /3^ updates, where £ = 1,..., Ig^ and /3 > 2 is a parameter to be fixed later. The 
£’th epoch consists of the /3^ — updates ..., At the end of 

epoch 1, the uniform random query u G F" is asked. 

When a deterministic data structure V* processes the (random) sequence of updates 

n = j jn^ ^ • ■ • ; jlj 

we say that a memory cell belongs to epoch £ if that memory cell’s contents where last updated while 
processing the updates of epoch i.e. it was updated during epoch £ and it was not updated during epochs 
— 1,..., 1. We let (7^(11) denote the set of memory cells belonging to epoch £ after processing 11. If V* has 
worst case update time we have |Q(n)| < f3^tu- We also define the set of probed cells P(n, u) as the set 
of memory cell probed when answers the (random) query v after processing the updates H. With these 
definitions, the main technical challenge is to prove the following: 

Lemma 3. If T>* is a deterministic data structure for dynamic online matrix-vector multiplication, with w 
bit cells, worst case update time tu and error probability 1/3 over the hard distribution, then for all epochs 
(4/6) Ig^ rif <£ < Ig^ rif, we have 


En,« [|Q(n) n P(n, u)|] = n min ■ 


nlg|F| ;3'lg|F| 


(&») 


assuming (3 = 1024t„u>/lg |F|. 

Before proving Lemma [H we show that it implies Theorem 01 By the disjointness of the cell sets 

Cig^ „3(n), ..., (71(11) we always have |P(n, u)| > l-P(n, u) n(7^(n)|. Thus by linearity of expectation 

we get 

ig^ 

En..[|P(n,u)|] > ^ En,„[|P(n,u)n(7,(n)|]. 

fci 

By Lemma [3l this sum is at least 


En,„[|P(n,u)|] > I ^ min ■ 

V^=(4/6) \gp 


nlg|F| /3^1g|F| 


, is (lift)’ 


w 


If nz(;/lg(t„w/lg |F|) > n^, we get a lower bound of n(n^lg|F|/w) from Lemma [3] applied only to epoch 
Ig^n^. If nw/ IgifuW/ Ig |F|) < the first term in the min expression is smallest for every index £ in 

the sum and we get a lower bound of n(n Ig |F| Ig^ n/lg(<„w/Ig |F|)). If < nw/lg(t„w/Ig |F|) < n^, 
there are Ig^n^ — lg^(nw/lg(<uw/Ig |F|)) > lg^(n/w) terms where the first term in the min expression is 
smallest, giving a lower bound of r2(n Ig |F| lg^(n/w)/lg(t„w/Ig |F|)). Since (3 = 1024t„w/Ig |F|, this proves 
Theorem 01 

The next section is devoted to proving Lemma 01 


4.1 Probes to Epoch i (Proof of Lemma [3]) 

In this section we prove Lemma 01 Let D* be a deterministic data structure for dynamic online matrix- 
vector multiplication with w bit cells, worst case update time tu and error probability 1/3 over the hard 
distribution. Let £ be an epoch satisfying (4/6) Ig^ < £ < Ig/g n^. Our goal is to prove that 


En.4ic^^(n)np(n,u)|] = L! 


^min 


nlg|F| /3^1g|F| 

Is- ( hrML\ ’ W 
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assuming (3 = 1024i„w/lg |F|. Our proof is based on an encoding argument. More specifically, we assume 
for contradiction that 


En..[|c,(n)np(n,u)|] = 0 


^min 


nlg|F| /3^1g|F| | 

Is(iSt)’ ” J 


( 1 ) 


and use this assumption to encode the random updates of epochs , 1 in less than H(xpi_i • ■ • xi) = 
/3^1g|F| bits in expectation. Here H{-) denotes binary Shannon entropy. By Shannon’s source coding 
theorem [^, this is a contradiction. 

For reasons that become apparent later, our encoding and decoding procedures will share a random source 
and also both need access to a:„ 2 ,... ,xpi. The randomness we need is a list Fi,... jF^ of m = 
independently chosen sets, where each F^ is a uniform random set of fc = (/3^ — l)/n vectors from F". Observe 
that Xi^i_i • • ■ xi are independent of x „2 • • ■ xpt and Fi ■ • • F^, thus H{xpt_i • • • xi | x „2 • • • x^^Fi • • • F^) = 
Ig |F| and we still reach a contradiction if we are able to encode x^<_i • ■ • Xi in less than Ig |F| bits in 
expectation when the encoder and decoder share x „2 • • ■ xpt and Fi • • ■ F^. 

The encoding argument will show that, assuming o, we can often find a small set of queries, all probing 
the same small set of cells, and that collectively reveal a lot of information about the updates of epochs 

. ,1. To this end, we need to formalize exactly how these queries reveal a lot of information. We thus 
need a few definitions: 


Definition 2. Let S Q {1,... ,n} be a set of indiees and let € F" be a vector. Then is the vector with 
l^l entries, one for each index i € S. The coordinate in corresponding to an index i G S has the value 
v{i). 

Definition 3. Let Si,..., Sn C {1,..., n} be n sets of indices and let vi,... ,Vk G F” be k vectors. Then 
the rank sum of vi,... ,Vk with respect to Si,..., Sn, denoted TZS{Si ,..., Sn, vi,..., Vk) is defined as 


TZS{Si,..., Sn,vi, ...,Vk) := y^dim(span(z;[‘^*,.. ■ ,ujf’)). 

i=l 

Definition 4. For i = 1,... ,n let be the set of column indices updated in the i’th row during epochs 
I,... i.e. = {jk '. k < P^ — \ /\ ik = i} ■ Let vi,... ,Vk G F"’ be a set of k vectors. Then the rank sum 

of vi,... ,Vk wrt. epoch i, denoted TZS-^{vi,... ,Vk), is defined as: 

ns^^vi, ...,Vk):= nS{Rf\ ..., R^\ vi,..., Vk). 

Note that Rp^ is not random since we always update the same fixed sequence of indices {ik,jk), it is only 
the value Xk that varies in our hard distribution. With these definitions, it should be intuitive that a set 
of query vectors vi,... ,Vk and their corresponding answers Mvi,..., Mvk will reveal a lot of information 
about epochs £,..., 1 if TZS-^{vi,..., Vk) is high. To exploit this in an encoding argument, we show that 
assumption m implies that we can find a small subset of cells in Ciijl) that answers a set of queries with 
high rank sum. The precise details are as follows 

Lemma 4. Let £ > (4/6) Ign^ and assume ([I}. Then with probability at least 1/4 over the choice of H, 
there exists a subset of cells CfijT) C C/(n) satisfying: 

1. |C';(n)| =/3^1g|F|/(1024w;). 

2. There exists at least distinct sets of k = {P^ — l)/n query vectors vi,... ,Vk for which: 

(a) D* does not err when answering Vi after the updates H for i = 1,... ,k. 

(b) P(n,^;,) n (^(n) \c;(n)) = % for i = i,...,k. 

(c) TZS-^(vi, ..., Vk) > n/c/32. 
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We briefly discuss the main intuition on why Lemma U eventually leads to a contradiction to assump¬ 
tion O: Assuming m, the lemma says that for most outcomes of 11 , one can find a relatively small subset 
C^(n) of cells in (7^(11), where there is a large number of queries that read nothing else from epoch £ than 
the cells in (71(11) (property 2.b). These queries must intuitively “collect” the information they need about 
epoch £ from this small set of cells. But property 2.c says that they need a lot of information, in fact 
even more than the bits in (7^(11) can possibly describe. This is the high level message of the lemma and 
eventually gives the contradiction. 

To not remove focus from bounding the probes to epoch I, we defer the proof of Lemma H] to Sec¬ 
tion 021 and instead show how we use it in the encoding argument. So let £ > (4/6) Ign^ and assume ([ 1 ]). 
Under this assumption, we show how to encode and decode in less than - ■ xi \ 

x „2 • • • xpiT 1 • • • Tm) = /3^ Ig |F| bits in expectation. Note that we condition on Ti • • • T^ and a ;„2 ■ ■ ■ x^e, 
which is shared information between the encoder and decoder. 

Encoding Procedure. Given If = (z„ 2 , jjj 2 ,a:„ 2 ),..., {ii,ji,xi) to encode, first observe that the indices 
ik and jk are fixed, thus we only need to encode ■ ■ ■ xi. We proceed as follows: 

1. We start by running the updates 11 on P*. We then check if a cell set (7/(11) C (7^(11) satisfying the 
properties in Lemma H] exists. If not, our encoding consists of a 0-bit, followed by a naive encoding of 

.. .xi, costing 1 -I- /3^1g|F| bits. In this case, we terminate the encoding procedure. Note that 
under assumption (ED, this happens with probability at most 3/4. 

2. If a cell set (7/ (11) C Ci{H) satisfying the properties in Lemma0]does exists, we let Ai denote the family 
of all sets of fc = (/?^ — l)/n query vectors satisfying 2.a-c in LemmalU We have |AI| > 

We then check if one of the sets in A4 equals one of the random sets Ti,... jT^- If not, we also 
write a 0 -bit followed by a naive encoding of Xi^t_i...xi and terminate the encoding procedure. 
This costs 1 -I- /3^1g|F| bits. Recalling that Ti,... are chosen independently of 11, we conclude 

that the probability of using a naive encoding of ■ xi in either step 1 . or 2 . is bounded by 

3/4 -H (1 - |A4|/(l“'f))”" < 3/4 -h exp(-m|A4|/|F|’^'=) < 3/4 exp(-|Fp("'=)) < 4/5. 

3. If we did not terminate and write a naive encoding in either step I. or 2. above, we have found a 
(7/(n) satisfying the properties in Lemma U as well as an index i* amongst {!,..., m} such that the 
vectors in T^. satisfy properties 2.a-c in LemmalU We use 7 i,..., 7 fe to denote these vectors. We now 
write a 1-bit, followed by an encoding of i* and the addresses and contents of cells in (7/(11). This 
costs 1 -I- Igm -I- |(7/(n)|2r(; < 1 -|- nfclg |F|/512 -|- nfclg |F|/512 < 1 -I- /3^ Ig |F|/256 bits. 

4. We then write down the addresses and contents of all cells in (7^_i(n),..., (7i(n). Since the worst 

case update time is and we chose £3 = lQ2AtuW/ \g |F|, this costs no more than |C'j(n)|2r(; < 

AP^-HuW < pHg |F|/256 bits. 

5. In the last step, we iterate through the rows of M from 1 to n. For row i, we create an initially 

empty set Xi of vectors in FI^c I. We now iterate through all vectors in FI^*” I in some arbitrary 

\R-‘ \R-‘ 

but fixed order. For each such vector v, we check whether v is in span( 7 j^ ‘ ,..., 7 ^. ‘ ,Xi). If 
not, we add v to Xi. We then continue to the next vector in F^^i" I. Once this terminates, we 

have dim(span( 7 i^’ ,..., 7 !,^* , Xi) = |i?p| and \Xi\ = |i?p| - dim(span( 7 i'^‘ ,..., 7 ].^’ )). Letting 
mi denote the 7th row vector in M after having processed all the updates 11, we finally compute 

(m) ’ ,v) for each v & Xi. We write down these inner products, costing |Ai|lg|F| bits. Summing 

over all rows i, this step costs Yl,i Ig I®'! = “ dim(span( 7 i'^‘ ,..., ))) Ig |F| = (/3^ - 

7^5^^(7l,..., Ik)) Ig |F| < (/3^ - nfc/32) Ig |F| = (/3^ - (/3^ - l)/32) Ig |F| bits. 
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Examining the above encoding procedure, we see that the expected length of the encoding is upper bounded 

by 


1 + (4/5)(/?' Ig |F|) + (l/5)(^' Ig |F| + Ig |F|/128 - (/?' - 1) Ig |F|/32) < 

1 + /3Mg |F| - (l/5)(3/3^ Ig |F|/128 - Ig |F|/32) = 

H{xpi_i ■■■xi I x „2 • ■■XpiTi ■ • -r^) - |F|) < 

H{x * * * X\ I x^2 * • • 

Thus to reach the contradiction to assumption o, we only need to show that we can recover • Xi 

from the above encoding and x „2 • • ■ • • • Em- We do this as follows: 

Decoding Procedure. 

1. First we check the first bit of the encoding. If this is a 0-bit, we recover xpi_i ■ ■ ■ xi directly from the 
remaining part of the encoding and terminate. 

2. If the first bit is a 1-bit, we first recover i*,(7^(11) and C^-i (11),..., ( 71 ( 11 ). From i* and ri,...,rm 
we also recover 71 ,... , 7 ^,. Since the update indices i„ 2 ,... ,ii and j„ 2 ,... ,ji are fixed, we can also 

\R~^ |i?—^ 

compute 7 ]^ ‘ ,..., 7 ^^, ’ for all rows i. 

3. For each row i in turn, we create an initially empty set Xi of vectors in Fl^i“ L We then iterate through 
all vectors v € F^^i" I in the same fixed order as in step 5. of the encoding procedure. For each such v, 

we check if v is in span( 7 j^ “ ,..., 7 ^ ’ , Xi). If not, we add v to Xi. We then continue to the next v. 
When this terminates we have reconstructed the sets Xi for all rows i. 

4. We now process the updates (ijj 2 ,_)„ 2 ,Xji 2 ),..., (i^<, on D*, that is, we process all updates 

until just before epoch £. We have thus computed the contents of every memory cell at the time just 
before epoch £. 

5. We now run the query algorithm of D* for 7 i,..., 7 fc. When answering the query 7 ^, the query 
algorithm repeatedly asks for a memory cell. When asking for a memory cell, we first check if the 
cell is amongst (7f_i(n),..., (7i(n). If so, we have its contents and can continue to the next probe. 
Otherwise we check if the cell is amongst (7^(11). If so, we again have its contents and can continue 
to the next probe. If not, we know by property 2.b of Lemma S] that the cell is not in (7^(11), i.e. 
it was not updated during epochs i,... ,1. Thus its contents after processing all of 11 is the same as 
after processing only updates preceding epoch £ and we thus have its contents from the previous step 
of the decoding procedure (step 4.). Since we are able to run the entire query algorithm, we get from 
property 2.a in Lemma|4]that we recover the vector for each % = I,..., fc. 

6 . Finally, for each row i = I,... ,n in M, we will recover all the values Xh for which h < (3^ — 1 and 
ih = i. This is precisely the values corresponding to the updates of the columns given by Rf^. We do 
this as follows: First, from the query answer Mjh for h = I,..., fc we can directly read off {mi,"fh) 
where rrii is the i’th row of M after processing 11. Since the decoder is given access to x„ 2 ,... ,xpt, 

this allows us to compute (m) ‘ ” )■ Finally from step 5. of the encoding procedure we also have 

{m[^' ,v) for every v € Xi. Since dim(span( 7 |^‘ ,..., 7 !.^* ,Xi) = this uniquely determines 

m\ ' and thus the values Xh for which h < — I and ij = i. Doing this for all rows finally recovers 

To summarize, we showed how to encode and decode ..., Xi in less than H{xpt_i ■ ■ ■ xi) bits under 

assumption ©■ This is a contradiction, completing the proof of Lemma [31 
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4.2 Finding a Cell Set (Proof of Lemma |4]) 

In this section, we prove Lemma |H For this, first define Will) as the set of query vectors w for which D* 
does not err when answering w after processing 11, and at the same time. 


|cdn)np(n,u;)| < i6En..[|Cdn) np(n,z;)|]. 


Define £{J1, v) as the indicator random variable taking the value 1 if D* errs when answering v after updates 
n and 0 otherwise. We have (11, f)] < 1/3. By Markov’s inequality and a union bound, we conclude 

that with probability at least 1/4 over the choice of B, we have both £^[1(17^(11) n P(n, ri)|] < 4En,i, [1(17^(11) n 
P(n,z))|] and Ei,[f(n,r>)] < 2/3. We say that the update sequence 11 is good when this happens. We can 
again use Markov’s inequality and a union bound to conclude that |VF(n)| > |F|”/12 when 11 is good. Now 
consider a vector w S VF(n) and let A = Ig |F|/(1024r(;). Observe that there are 

subsets C" C Cf(n) of A cells, satisfying P(n,i(;) D (Ct{U) \ C) = 0. By averaging, this means that there 
is a set C/(n) C C^(n) of A cells, with at least |W4(n)|distinct vectors 
w G VF(n) satisfying P(n,?ii) O (C^(n) \ (7/(11)) = 0. This is lower bounded by 


l^(n)|(ic.mMgmnp^ |VF(n)|(|Q(n)| - |cdn) n p(n,u;)|)!A!(|cdn)| - A)! 

(|Cdn)|) |Cdn)|!(A - |Q(n) n P(n,u;)|)!(|Cdn)| - A)! 

|w(n)|(A - |(7dn) np(n,u;)|)i'='^wnp(n,^)i 

- |(7^(n)|IC’dn)nP(n,™)| ’ 

When n is good, assumption (P) implies |(7^(n)nP(n, w)\ = o{f3^ Ig IFj/rc) and we chose A = Ig |F|/(1024r(;). 
Thus for good 11, the above is at least 


> 


> 


|w(n)| 

|W(n)| 

|W(n)| 


/ ^ yc^(n)np(n.»)| 

\mm) 

/ ^ yc^(n)np(n,u;)| 

[w^j 

( lg|F| ycdn)np(n,»)| 
V2048t„w/ 


For good n, assumption ([T|) also implies \CtiIV) 0 P(n,w)| = o(plg |F|/(lg((t„w)/lg |F|))). Inserting this 
above we get 


> |TT(n)||F|-°(") 

Thus for good 11, we have at least vectors w such that D* does not err when answering w after 

processing 11 and also P(n, re) (7 ((7^(n) \ (7/(11)) = 0. Let the set of these vectors be denoted C(n). Now 
let k = (/3^ — l)/n and consider all fc-sized subsets of C7(n). There are distinct such 

sets. We want to show that most of these sets have high rank sum. For this, we have the following lemma: 

Lemma 5. If (. > (4/6)lg^n^ and k = {(3^ — l)/n, then there exists no more than distinct sets 

of k vectors, vi,... ,Vk in F", such that TZS-^{vi, ..., Vk) < n/c/32. 

Observe that combined with |P(n)| = for good 11 and Pr[n is good] > 1/4, Lemma [5] 

immediately implies Lemma |4| We thus finish our proof of Lemma |4| by proving Lemma [5j 
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Vectors and Rank Snm (Proof of Lemma[5|). In the following, we prove Lenima[SJ Let I > (4/6) Ig^ r? 
and let V be the family of all sets of fc = (/3^ —l)/n distinct vectors in F" for which 'RS-^{vi ,..., Vk) < nfc/32. 
Our goal is to show that V has to be small. The intuition for why this is true is as follows: If a set of vectors 

vi,... ,Vk has small rank sum wrt. epoch then for most rows i, we have that dim(span(r;)^ ' ,...,u/ * )) 
is small. This means that, when restricted to the columns in R~^, the vectors vi,... ,Vk must be contained 
in a low dimensional space. Since there are not too many vectors in a low dimensional space, this gives a 
bound on the size of V when restricted to the coordinates in R~^. From there, our choice of well-separated 
update indices also comes into play. This property of the update indices basically ensures that different rows 
put constraints on different coordinates of ui,..., Ufc, effectively ensuring that if vi,... ,Vk is a set in V, then 
they must lie in a low dimensional subspace no matter which subset of columns we consider. This finally 
gives a bound on \V\. We formalize this intuition using an encoding argument. 


Encoding Argument. Let vi,... ,Vk be a set of fc = — l)/n vectors from the family V. We present an 

efficient encoding and decoding procedure for ui,..., This gives a bound on \ V\. The encoding procedure 
is as follows: 


1. Given vi,..., Vk, we let I be the set of all row indices i for which dim(span(fj^ ' ,..., u/ * )) < /c/I6. 
Since we assumed TZS-^{vi ,..., Vk) < nfc/32 and by definition TZS-^{vi ,..., Vk) = TZS{Rf ^,..., R^^), 
it follows from Markov’s inequality that |/| > n/2. 


2. Since our update indices {ih,jh) were chosen as well-spread, and since £ > (4/6) Ig^ n^, it follows from 
Definition [T] that we can find a subset /* C / of indices satisfying |/*| < 8n^/(/3^ — 1) = 8n/k and 
{jh}\ > nji. The first part of our encoding is such a set of indices I*. This costs no 
more than 8nlgn/k bits. 


3. For each i G I*, in increasing order, let = Rf^ \ If empty, we continue 

\Fl~^ \R~^ 

to the next index in I*. Otherwise, find some basis wi, ... ,Wd for span(u) * ,... ,v'^ ' ) . Observe that 
d < fc/I6 since I* C I. We write down d and wi,..., Wd. This costs no more than Ign + d\Rf^\ Ig |F| 

\R-‘ 

bits. After having specified iCi,..., Wd, we also write down ' ) for each pair j G {1,..., d} 

and h G {!,..., fc}, costing dfclg|F| bits. Summing over all i G I*, the total cost of this step is 
thus at most |J*| Ign-|-(fc/I6) Ig |F| \Rf^\ + |I^*|(fc/16)fclg |F| bits. This is at most 8nlgn/k + 

(fc/16)lg|F|E.ez. |i?Pl+nA:lg|F|/16. 

4. Finally we let X be the set of column indices not contained in any R^^ with i G I*. For each vector 
vi,... ,Vk iiT. turn, we write down each coordinate corresponding to a column in X. This costs k\X\ Ig |F| 
bits. 


Next we show that we can recover vi,...,Vk from the encoding produced by the above procedure. This is 
done as follows: 


1. From the bits written in step 2. of the encoding procedure, we recover I*. 

2. For each i G /*, in increasing order, compute Rf^ = Rf^ \ (these sets depend only 

on i and the fixed update indices). If is empty, we continue with the next index in I*. If not, we 
read the value d and the basis wi^... ,Wd written for this index i G I*. We then read {wj ,u/ * ) for 
each pair j G {1,..., d} and h G {I,..., fc}. Since each is in span(r)[^* , ■ • ■, ), these inner 

products allows us to recover each coordinate Vh{c) for any column index c in Rf^ and any h = 1, ... ,k. 
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3. What remains is to recover all coordinates corresponding to column indices c where c ^ for any 
h = 1,..., fc. But UzG/* “ Ui£ 7 * thus we recover the remaining coordinates from the bits 

written in step 4. of the encoding procedure. 

We have thus shown that we can encode and decode every set of k vectors vi,... ,Vk into a string of at most: 
16n \gn/k + (/c/16) Ig |F| ^ \Rf^\ + nk Ig |F|/16 + k\X\ Ig |F| 

iGl* 

bits. But \^f^\ = n — |X|, and we rewrite |X| = n — The above is thus equal to 

nk\g |F| + 16nlgn//c + nfclg |F|/16 — (15/16)/clg |F| ^ \Rf^\- 

iGl- 


We also have 




> n/4. This means that our encoding uses no more than 


n/clg |F| + 16nlgn/fc + n/clg |F|/16 — (15/64)n/clg |F| 

(53/64)n/clg |F| + 16nlgn/fc 
(54/64)n/clg |F| 

bits. We thus conclude |y| < 2(27/32)»fcig |f|^ 


< 


4.3 Well-Spread Sequence (Proof of Lemma [2]) 

Let r be some value in the range < r < n?/8 and let 5 be a set n/2 row indices. Partition S into 
|S'|/(n^/r) = r/(2n) consecutive groups Gi,... ,Gr/( 2 n) of n^/r rows each. Now let (i„ 2 , j„ 2 ),..., (*i, ji) be 
a uniform random permutation of the pairs of indices in {1,..., n} x {1,..., n}. We bound the probability 


that 


LJ/c<r:ZfeeGj ,bk}\ < n/4 for all h = 1,..., r/ (2n). This probability is upper bounded by 


r;) 


To see why, observe that if 


bk<r-.ikGGhbb < 7i/ 4 for all h = 1,... ,r/(2n), then for all /i = 1,... ,r/(2n), 

there must exist a set of 3n/4 column indices Ch such that ^Ufc<r{(*fe’./fc)}) H x Gh) = 0. Thus we 
must have: 

^rl{2n) 

n I U {GhxCh)\^% (2) 

.k<r I \ h^l 


At the same time, we have 


r/ (2n) 

U (GhxGh) 

h^l 


= (r/(2n))(n^/r)(3n/4) = 3n^/8. 

•/(2n) 


These inequalities explain our upper bound. The term counts the number of possible families of 

sets Gi,.. .,Gh that could satisfy ©• For a particular choice of Ci,..., there are only choices of 

Uj,<r{(*fe) Jfe)} that gives the required disjointness. The denominator just counts the total number of choices 
of bk<r{i'^k, jk)}- We continue our calculations: 

(n%) /s {^n^/8)\r\{n^ -r)! 

^ ’ (n2)!r!(5nV8-r)! 


< 


^ ’ (n2 - r + 1)’' ■ 
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Since r < n^/8, this is at most 


(4 )./8(5nW 
^ ^ (7ny8y 




5 
7 

< 0.97’' 


Thus for a particular choice of <r< n?/8 and set S of n/2 rows, the probability that there does not 
exist a subset S'* C S' with |S*| < n^/r and [jk<r-ik&S*i^k} > n/A is at most Since there are 

less than {j^/ 2 ) < 2^" possible choices for r and S, we can union bound over all of them and conclude that 
there exists a sequence of update indices (i„ 2 , j„ 2 ),..., (ii, ji) such that for any < r < n^/8 and any 
set S of n/2 rows, there exists a subset S* C S with size r?/r satisfying [jk<r-ii,GS*ijk} > n/A. For the 
case n^/8 <r <n^ and any set S of n/2 rows, the same sequence of updates must necessarily have a subset 
S* of size at most 8r? jr for which |Ufc<r' 4 eS*{i^*}| — Thus (in 2 ,jn 2 ),..., (ii, ji) is well-spread. 
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